Add collector-level na support #541

khusmann · 2024-07-18T01:18:30Z

This PR adds support for collector-level na args (#532). This way, different lists of missing values can be specified for each column, overriding the global na arg in the call to vroom().

Example:

vroom(
  I("a,b,c\na,foo,REFUSED\nb,REFUSED,MISSING\nOMITTED,bar,OMITTED\n"),
  col_types = cols(
    a = col_character(na = "OMITTED"),
    b = col_character(na = "REFUSED"),
    c = col_character()
  ),
  na = "MISSING"
)
#> # A tibble: 3 × 3
#>   a     b     c      
#>   <chr> <chr> <chr>  
#> 1 a     foo   REFUSED
#> 2 b     NA    NA     
#> 3 NA    bar   OMITTED

Without this PR, it is very difficult to efficiently read columns with different lists of missing values. Instead, they have to be loaded as character vectors, then parsed with readr::parse_*() or readr::type_convert(). There are two problems with this:

parsing a chr vector after loading with vroom forces the vector to materialize, defeating vroom's lazy-loading altrep goodness
vroom & readr's parsing rules have slightly diverged in subtle ways (e.g. type_convert() does not parse IEEE 754 double values (NaN, Inf, -Inf) readr#1526)

I'm hoping you'll consider this PR for inclusion to vroom – it only requires a few changes, is 100% backwards compatible, and adds a feature that cannot otherwise be implemented in a separate package (without duplicating all of vroom's internals). Please let me know if there is anything more I can do to advocate for it. Thank you for your consideration!

khusmann · 2024-07-18T17:27:12Z

Note that this is failing the check for windows-latest (3.6) because the runner is grabbing the latest version of evaluate, which now requires R >= 4.0.0.

khusmann added 3 commits July 17, 2024 16:34

add na argument to collectors

4a92232

add support for collector-level na args to the backend

4c0a3f1

add test for collector-level na args

ca143b4

add test to exercise the col_guess()->resolve_collectors() code path

255cabd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add collector-level na support #541

Add collector-level na support #541

Uh oh!

khusmann commented Jul 18, 2024

Uh oh!

khusmann commented Jul 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add collector-level na support #541

Are you sure you want to change the base?

Add collector-level na support #541

Uh oh!

Conversation

khusmann commented Jul 18, 2024

Uh oh!

khusmann commented Jul 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant